-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/redis] prevent zombie PIDs in redis health checks #3559
Conversation
Hi @kabakaev, Thanks for opening this PR! I think is a good approach to have less timeout in the probe script than in the probe spec so the probe script could finish safely and we'll avoid the zombie PID's, I've just had a question regarding this: what would be the best time between them? 1 sec would be enough? I mean, It could depend on the CPU usage of the cluster, right? wasn't it good to add a new property to let the user configure it? Regarding |
Signed-off-by: Alexander Kabakaev <kabakaev@gmail.com>
52404fc
to
5011763
Compare
@dani8art, thanks for quick review!
The scrips are written already in a way to guarantee that the probe ends in exact time specified, maybe only a few milliseconds longer than configured. Thus, it's not the the kubernetes object spec's Moreover, setting the timeout to
I'm not aware of any security issues with shared process namespace. In fact, shared PID namespace was default setting for kubernetes@docker for a while.
I'm fine with that. First commit already fixes the issue for default settings of liveness/reasiness probes. I've updated the second commit accordingly. What about |
I think we should tackle as well, are you able to open a PR too?
That's right but could you set it Regarding timeout, it is ok to me, let see how it works and if there is no issue with that. In case someone comes with an issue then we will change to a separate parameter. Thank you so much for the detailed explanation. |
5011763
to
5405328
Compare
I said that it's switched to The change is pushed properly now. |
Sure, will do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Just a small change.
Thank you so much!
Signed-off-by: Alexander Kabakaev <kabakaev@gmail.com>
5405328
to
dc99958
Compare
Description of the change
Though PR #2453 decreased probability of zombie PIDs generation, the problem still exists, see
Steps to reproduce
below.The same issue probably exists in the redis-cluster chart. Let me know if a PR for redis-cluster is needed.
This PR fixes #2441 once and for all.
Benefits
Whatever a user may write into readiness or liveness probes, no more zombie PIDs will be accumulated. Default settings will also stop generating zombies randomly.
Possible drawbacks
None.
Applicable issues
Root cause
Both
livenessProbe
andreadinessProbe
are configured asexec
of a shell script. Each script has a configurable timeout parameter. The same timeout value is used in thetimeoutSeconds
parameters of the probe spec. This creates a race condition: if the probe script exits longer than thekubelet
Go runtime schedules the timeout handler, thenkubelet
will fire the probe timeout procedure, leaving behind the probe script zombie PID. From what i see, such race happens more often on nodes with high CPU utilization.This PR increases the
timeoutSeconds
parameter by 1 second, giving the probe script enough time to finish and let exit code reachkubelet
.The second commit also enables
shareProcessNamespace
pod option, which is the universal solution to the problem of zombie PID accumulation. Details are given below.Steps to reproduce
Tested in kind.
Install redis chart:
helm upgrade test1 ./bitnami/redis --install --set=cluster.enabled=false,usePassword=false,persistence.enabled=false,metrics.enabled=false,master.livenessProbe.enabled=false
Make sure the pod is running:
Run a
watch ps
inside container:Find the redis process in host and stop it:
Look back on the
ps watch
inside containter to see the zombie PIDs ("defunct")."Unpause" redis:
kill -CONT 30042
.The zombie PIDs will remain.
Why even bother about zombie PIDs?
Though it may seem not important, but high amount of zombies may lead to all sorts of problems, from inability to make redis snapshot to dead kubernetes daemons. With health probes running every 5 seconds, it's quite easy to reach a pod PID limit.
Shared process namespace is good
In kubernetes,
shareProcessNamespace: true
option has a side effect that all containers will run with proper init process.Specifically, the first process in first container of a pod gets PID=1 (init), which becomes also PID=1 in all other containers. In kubernetes, first pod process is always
/pause
from the pause container (part of network setup). This mighty binary can read exit codes of orphaned processes and thus prevent zombies from exhausting the pod/node PID limit.Such shared PID namespace mode was enabled in k8s@docker by default for some time. But with introduction of
shareProcessNamespace
option, the shared PID NS was switched back to disabled by default.Thus, an easy solution would be to enable
shareProcessNamespace
on all pods which useexec
health check.Otherwise, in order to avoid leaving zombies behind, each exec script must be written to finish in a guaranteed time shorter as the k8s probe
timeoutSeconds
parameter. This is nearly impossible to do for user-defined health checks.Checklist
Chart.yaml
according to semver.[bitnami/chart]
)values-production.yaml
apart fromvalues.yaml
, ensure that you implement the changes in both files